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I. INTRODUCTION 

The reconstruction of a quantum state from experi- 
mental data is generally imprecise for a variety of rea- 
sons. Measurement devices work with limited accuracy; 
the sample size is finite; and the set of observables used 
might not be informationally complete. Rather than sin- 
gling out a quantum state uniquely, such imperfect data 
are generally compatible with an infinite set of different 
states. Identifying the most plausible member of this set 
can be a formidable challenge that requires advanced sta- 
tistical estimation techniques [l[ . For this task there exist 
numerous theoretical schemes, some widely used exam- 
ples being pattern functions 0, Q , maximum likelihood 
(J-Q or - in case of informational incompleteness - the 
maximum entropy method [7l-[To|. 

Yet while, e.g., maximum likelihood estimation has 
proven successful in many practical applications, it 
reaches a theoretical limit and may even lead to inconsis- 
tencies when applied to special situations: For instance, 
it may return zero eigenvalues for a density matrix when, 
in fact, this is just an artefact caused by a small sam- 
ple size [ni- This suggests that prior knowledge beyond 
the "naked" data may be important, and has motivated 
Bayesian modifications to the conventional scheme [l^ . 
Another shortcoming of many algorithms is that they 
do not c^uantify error bars, a piece of information that 
would be crucial to assess the reliability of an estimate; 
here, too, important theoretical developments are cur- 
rently underway [H, [13] . 

In addition to their conceptual limitations most recon- 
struction schemes run into practical difficulties as soon 
as quantum systems become more complex, as the re- 
quirements for experimental and computational resources 
grow exponentially with the number of constituents of a 
composite system. For example, Haffner et al. fiS\ report 
that, to reconstruct an entangled state of eight calcium 
ions, they needed to perform 656,100 experiments. This 
has triggered an intense search for alternative quantum 
tomography protocols that can do with fewer resources 
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Recently a number of proposals have been made for ef- 
ficient protocols whose resource requirements grow only 
polynomially with system size. All these proposals are 
limited in scope: Either they do without complete state 
reconstruction and instead focus on ascertaining some 
specific property of the state such as the presence or ab- 
sence of entanglement [13, [3; or else they presuppose 
that the unknown state lie in some lower-dimensional re- 
construction subspace [l^ or belong to some privileged 
subclass of states such as matrix product states, with a 
number of parameters which is only polynomial in system 
size [20l-|22|. Protocols of the latter type deliver one or 
both of the following: (i) the information whether or not 
an unknown state does indeed belong (within some given 
error bound) to the privileged subclass; and (ii) if so, an 
estimate for the associated state parameters. While this 
falls short of tomography for arbitrary states it is never- 
theless of great use for many real- world situations as, e.g., 
matrix product states capture the low energy physics of 
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a wide range of one-dimensional systems 

The recent proposals illustrate that efficiency gains can 
be achieved if one makes clever use of prior knowledge 
about the system — such as in the above example, the 
well-founded expectation that the system is described 
by a matrix product state. Prior knowledge thus plays 
a pivotal, and often underappreciated, role in quantum 
state tomography. Its use in a tomographic scheme not 
only ensures internal consistency and in particular avoids 
the zero-eigenvalue problem encountered in conventional 
maximum likelihood; but it is also key to realizing sorely 
needed efficiency gains. In most experimental settings 
such prior knowledge is in fact available, in the form of 
well-founded (implicit or explicit) expectations as to the 
output state. These expectations might be based on a 
theoretical model of the underlying physics, past results 
of the same or similar experiments, or some combination 
thereof; and they may specify an anticipated output state 
uniquely, or merely its parametric form. 

The situation where prior expectations merely favor 
a certain subclass of states, without any bias as to the 
parameter values within this subclass, can in general be 
reduced to the case where one anticipates some unique 
output state. For as long as there is a polynomial-time 
algorithm for finding within the subclass the member 
which best matches some given experimental data - as 
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is the case for matrix product states - one can always 
pre-process the data to obtain the best match within the 
subclass, and subsequently consider this best match to 
be the unique anticipated state. In many realistic cases, 
therefore, one may assume that prior knowledge comes 
in the form of some unique anticipated state. 

While there is thus an expectation as to the output 
state, it is obviously not guaranteed that precisely this 
anticipated state will be produced in the experiment — 
after all, if this were certain a priori, what would be the 
point of the experiment? The anticipated state comes 
only with a finite (and often not precisely quantified) de- 
gree of confidence: It is well possible that due to unwar- 
ranted approximations in the theoretical model or exper- 
imental inaccuracies the actual output state will deviate 
from the anticipated state. This deviation is not expected 
to be very large, for else there would be a serious prob- 
lem with the theory or experimental setup; but a quan- 
titative bound is difficult to establish a priori. Strictly 
speaking, therefore, the prior expectation is that the ac- 
tual state lies within some unspecified, yet not too large 
neighborhood of the anticipated state. Often the precise 
magnitude of this deviation constitutes one of the novel 
insights provided by the experiment. 

In this generic setting quantum state tomography re- 
duces to the following tasks: (i) verify that the unknown 
state of the system is indeed in the proximity of (but 
not necessarily identical with) the anticipated state; (ii) 
whenever it deviates from the anticipated state, prescribe 
how this initial estimate must be amended; and (iii) 
quantify the error bars associated with the updated esti- 
mate. The recent proposals mentioned above accomplish 
these tasks in part: They do supply efficient protocols, 
yet only for the special case where the anticipated state 
is of matrix product form, and only for the first of the 
three tasks. Against this backdrop, the purpose of the 
present paper is to propose a universal protocol with a 
significantly broader scope: a protocol which takes into 
account prior knowledge in a generic way, allowing arbi- 
trary anticipated states, and which tackles efficiently all 
three tasks at once. 

In order to achieve this objective I will borrow pow- 
erful techniques from the classical field of image recon- 
struction [23, [23|. There, too, a large amount of noisy 
(pixel) data needs to be processed with sophisticated al- 
gorithms to yield the most plausible tomographic image. 
Like in quantum state tomography, one key to improving 
the efficiency and accuracy of these algorithms has been 
to use whatever prior knowledge is available, however 
qualitative or vague. Mathematically, this is achieved by 
modelling prior knowledge with distributions that con- 
tain themselves unknown parameters; these so-called hy- 
perparameters are then themselves subjected to an es- 
timation procedure. In its most general form such a 
scheme is known as hierarchical Bayes. Practical calcula- 
tions usually involve well-controlled approximations, the 
most important being the evidence procedure |25l433l |. It 
entails a maximum likelihood approximation applied to 



the estimation of hyperparameters and is therefore also 
called generalized maximum likelihood or ML-II. 

Transferring this idea to quantum state tomography, 
the unknown degree of confidence as to the anticipated 
state becomes a hyperparameter. This hyperparameter 
can only be estimated a posteriori based on the overall 
compatibility of experimental data with the anticipated 
state. The estimation involves a generalized maximum 
likelihood approximation which, as it pertains to a one- 
dimensional distribution only, is usually of good quality 
— more so than the conventional maximum likelihood 
approximation which pertains to a higher-dimensional, 
and hence generally broader, distribution on state space. 
The thus estimated hyperparameter establishes whether 
or not the unknown state of the system is indeed in the 
proximity of the anticipated state; and if so, it determines 
the optimal weight to be attributed to anticipated state 
and experimental data, respectively, yielding a posterior 
estimate that interpolates between the two. The proce- 
dure is optimal in the sense that it maximizes the degree 
of confidence about the resulting posterior estimate; and 
it is efficient in that it does so with high accuracy even 
when measurements are not informationally complete. 

In the present paper I flesh out these ideas in math- 
ematical detail and demonstrate their use in a concrete 
example. I shall first adapt the framework of the classical 
evidence procedure to its novel use in quantum state to- 
mography. I will establish the criteria for its applicability 
and show that it is in fact particularly suited for tomog- 
raphy of large quantum systems. I will further show that 
the evidence procedure is internally consistent and in par- 
ticular avoids the zero-eigenvalue problem encountered in 
conventional maximum likelihood; that it greatly speeds 
up tomography (certification or incremental amendment 
of an arbitrary anticipated state) whenever the experi- 
mental data are sufficiently close to prior expectations 
overall; and that it provides error bars in a straightfor- 
ward fashion. I will then illustrate its use in a simple 
four-c^ubit system. 

For simplicity I shall disregard inaccuracies introduced 
by the experimental apparatus and instead focus on the 
two other sources of imprecision, finite sample size and 
lack of informational completeness. Incorporating exper- 
imental errors into the formalism is a straightforward ex- 
ercise that will be dealt with in future work. 

In Section |TT] I introduce the evidence procedure, al- 
ready in a mathematical form adapted to quantum state 
tomography. In Section Hill I illustrate its use with a toy 
example, a four-qubit system with parameters tuned (for 
simplicity) such that all calculations can be done analyt- 
ically. In Section IIVI I conclude with a brief discussion. 



II. EVIDENCE PROCEDURE 

Suppose one wants to infer the state p oi a d- 
dimensional quantum system from a set of r sample 
means {ga} measured on a sample of size N. Since N 
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is always finite, and the set of observables {Ga} need not 
be informationally complete, these data do not specify 
p uniquely but only yield some probability distribution 
proh{p\N, {ga})- If this distribution is sharply peaked at 
some pe then this p,, constitutes a plausible estimate for 
p; and the width of the peak is a measure for the quality 
of the estimate. Due to Bayes' rule the post-measurement 
distribution is the product of likelihood and prior, 

prob(p|iV, {ga}) cx prob({5j|iV, p) ■ prob(p). (1) 

The conventional method of maximum likelihood disre- 
gards the prior and focuses on the peak of the likeli- 
hood function. In contrast, the evidence procedure (as 
do other Bayesian methods) seeks to encode in the prior 
whatever ancillary information is available, however qual- 
itative and vague, and use this to improve the accuracy 
and efficiency of the estimate. 

Often prior knowledge comes in the form of a bias to- 
wards some anticipated state a derived from theoretical 
considerations or past experience. If this is the only infor- 
mation available then one expects the prior to be peaked 
at, and symmetric around, a; yet the degree of confidence 
as to this bias, and hence the width of the distribution, 
are generally not known. Based on very general consid- 
erations of classical statistical inference [34j| which read- 
ily carry over to the quantum case, the prior must be a 
monotonically decreasing function / of the distance of p 
from tr, measured in terms of the relative entropy: 



prob(p) = f[S{p\\a)], 



(2) 



where S{p\\a) :— tr(plnp— plna). In the evidence pro- 
cedure the prior is in fact modelled as an integral 

/•OO 

prob(p) = / da prob(p|Q;) • prob(a) (3) 
Jo 



where 



prob(p|a) = Z{a)-^ exp[~aS'(p||cr)] (4) 



with (in the Gaussian approximation) Z{a) ~ (27r/Q;)"/^ 
and n — cP — 1. The prior thus features an unknown 
hyperparameter a whose value is to be estimated a pos- 
teriori. 

Given the experimental data the distribution for a is 
updated according to Bayes' rule 



prob(Q;|iV, {ga}) oc proh{{ga}\N, a) ■ prob(Q;) 



(5) 



While the prior prob(Q;) is generally broad, refiecting the 
vagueness of the initial bias, the likelihood function may 
be peaked around some ao at which 
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da 



prob({5j|iV,a) = 0. 



Whenever this peak is sufficiently sharp, 



lnprob({5j|iV,a) > 1, 



(6) 



(7) 



it is justified to make the approximation 
prob(p) prob(p|Q;o). 



(8) 



This amounts to a maximum likelihood approximation 
which, in contrast to the conventional approach, no 
longer pertains to the estimation of the full quantum 
state but to that of the hyperparameter a - whence the 
method's alternative names generalized maximum likeli- 
hood or ML-II. 

Whenever the approximation (|8|) is justified state esti- 
mation can be based on the posterior 

prob(p|7V, {ga}, ao) oc proh{{ga}\N, p) ■ prob(p|Q;o). (9) 



Thanks to the quantum Stein lemma |35l - l38{ the likeli- 
hood function can for large N be written in the form 



Woh{{ga}\N,p) (X eM-NS{pP\\p)] 



(10) 



where /i^ is the unique state that minimizes the relative 
entropy S{ijl\\p) with respect to p while yielding as expec- 
tation values of {Ga} the observed sample means {ga}- 
In the Gaussian approximation it is 



S{pp\\p) ^ S{p\\pp) ^ S{p1^^^\\p1) 



(11) 



where M^jp) and /i^ now minimize the relative entropy 
with respect to the initial bias cr rather than p, while 
yielding for {Ga} the same expectation values as p, 
{ga{p)}, or the observed sample means {ga}, respectively. 
The various p^s all have a generalized canonical form l39j : 



e.g.. 



p„ cx exp 



(lna-(lna),)-^A'^G, 



(12) 



with Lagrange parameters {A"} adjusted such that the 
state yields as expectation values of {Go} the observed 
sample means. For a typical bias towards total ignorance, 
a — I /d, this state coincides with the estimate based on 
maximum entropy . 

The second factor in the posterior ^ is the prior (jH). 
Like the likelihood function it features in the exponent a 
relative entropy, which can be decomposed into 



S{p\\a)^S{p\\p1^^^) + S{p1^^^\\a). 



(13) 



Summing up the various terms in the exponents of like- 
lihood and prior, and exploiting the quasi-linearity 

(1 - t)S[p\\p) + tS{p\\a) = S{p\\p{p, G- 1)) + C{p, G- 1) 

where p(p, cr; t) cx exp[(l — t)ln p + tlna] and C is inde- 
pendent of p, the posterior finally acquires the form 

prob(p|iV, {5Q},ao) cx 

exp[-aoS{p\\pg(p^) - {ao + N)S{pg^p^\\pe)] 
It is peaked around 

N 



(15) 



Pe CX exp 



«0 

ao + N 



■ln( 



ao + N 



ln< 



(16) 
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with error bars of the order O(l/Vao + ^) as to the de- 
grees of freedom that have been measured, and 0(1/ i/ao) 
as to those that have not. The estimate Pe has the same 
parametric form as the (generahzed) Maxent estimate 
([T2|). yet with Lagrange parameters rescaled by a factor 
N/ (ao + N) . The evidence procedure thus interpolates 
between the initial bias and the experimental data: For 
iV <C ao quantum state estimation is dominated by the 
prior, and one is advised to stick to the initial bias. For 
small sample sizes the zero-eigenvalue problem is thus 
avoided. At the opposite extreme, iV ao, the rescal- 
ing factor approaches unity; hence as one would expect, 
quantum state estimation is then dominated by the likeli- 
hood function (fTU|) and leads to the (generalized) Maxent 
estimate (fT^ . 

The optimal value ao for the hyperparameter is yet 
to be determined, and the validity of the (generalized) 
maximum likelihood approximation yet to be established. 
As for the former, one introduces the marginalization 

proh{{ga}\N, a) ^ j dp prob({ga}|7V, p) ■ prob(p|a) 

(17) 

and uses 

(9/(9a)prob(p|a) = [n/2a - S{p\\cr)] ■ prob(p|a) (18) 

to derive from the extremum condition © an implicit 
equation for ao: 

{aoS[p\\a))^,^n/2, (19) 

where {. . .)ao denotes the expectation value calculated 
with the posterior (jl5l) . In the Gaussian approximation 
this expectation value can be decomposed into three sum- 
mands 

(ao5'(p||cr))ao = (ao5'(p||/^^(p)))c.o + (aoS'(Ai^(p)||pe))ao 

+ao5(pe|k) (20) 
yielding, respectively, 

(ao5(p||M^(p)))a„ = ("-r)/2, (21) 
(«o5(a*^(^) ||pe))ao = «o/(ao + N) ■ r/2, (22) 

aoS[p,\\a) = aoA^V(ao + Nf ■ S{p1\\a), (23) 

provided the r sample means are independent. This leads 
to the explicit formula 

ao = (1- A^mi„/A^)"' -iV^in (24) 

with 

iV,„i„ :=r/25(^^||a). (25) 

In order for the evidence procedure to be applicable 
the hyperparameter must be non-negative, and hence the 



sample size must exceed A^min- In addition, the inequal- 
ity ([7]) must hold. By a similar reasoning as above the 
latter translates into an inequality for the variance of the 
relative entropy, 

var(aoS'(/9||f7)) < n/2- 1, (26) 

the variance again being evaluated in the posterior (jlSp . 
As before, in the Gaussian approximation this variance 
can be decomposed into three summands 

var(ao5(p||cr)) = var(ao5(/?||^^(p))) + var(aoS'(Mg(p) llPe)) 
+4a2(([a-pe];[p-Pe])^Joo (27) 

where in the last summand (sOp denotes the scalar 
product induced by the entropy l4l| . evaluated at 
Pe- These three terms yield, respectively, 

var(ao5(p|lM^(p))) = (7^-r)/2, (28) 



var(ao5(M^(^)|lp,)) = al/ia^+Nf ■ r/2, (29) 



4a2(([a-pe]; [p-Pe])l}a„ - 2alNy{ao + Nf-S{p-g\\a). 

(30) 

Inserting for the hyperparameter the formula (I24[) then 
gives the criterion 

r/2 • (1 - N,-,ijNf » 1; (31) 

i.e., the evidence procedure presupposes that the set of 
observables is large, r ^ 1. Thus the evidence procedure 
is adapted in particular to the tomography of quantum 
systems that are large. 

Combining the last inequality once more with the for- 
mula (j24p for the optimal hyperparameter shows that the 
latter must be inside the range 

A^min < ao < • ^min (32) 

and hence approximately of the same size as A^min- Con- 
sequently, fluctuations in hitherto unmeasured degrees 
of freedom are effectively estimated to be of the order 
0(l/\/Amin); the hitherto unmeasured degrees of free- 
dom are estimated to an accuracy as if they had been 
measured on a sample of size A^min- This fictitious sample 
size Afmin becomes large whenever S{pg\\a) is small, i.e., 
whenever the experimental data are in good agreement 
with prior expectations overall. In this case the data, al- 
beit informationally incomplete, provide a high degree of 
confidence not only as to the degrees of freedom actually 
measured but also as to those that have not. If one seeks 
confirmation or only incremental amendment of prior ex- 
pectations, therefore, it is not necessary to measure a set 
of observables that is informationally complete. 
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III. EXAMPLE: FOUR QUBITS 

In this Section I apply the evidence procedure to in- 
complete tomography of a four-qubit system. The exam- 
ple is intentionally simplified in order to make all calcu- 
lations tractable analytically. 

An M-qubit system (say, M entangled photons) is de- 
scribed in a Hilbert space of dimension d = 2^^. Com- 
plete state tomography would require measurement of 
n = 2"^^ — 1 different observables, i.e., demand resources 
that grow exponentially with the number of qubits. As 
M increases, this quickly becomes impossible in real ex- 
periments. Here I suppose that in an actual experiment 
one can only determine sample means {si} of the single- 
qubit spin vectors 



(33) 



where Xi,Yi,Zi are the Pauli matrices pertaining to 
qubit i, and sample means {c,y } of the qubit-qubit cor- 
relation matrices with components 



c: 



'ah 



a.b — 1 . . .3 , i ^ j. 



(34) 



Together these are r — 3M^ independent sample means, 
a number growing only quadratically with the number of 
qubits. For four qubits it is r = 48, less than 20% of an 
informationally complete set (n — 255), but still much 
greater than one and hence within the range where one 
can apply the evidence procedure. 

Before collecting experimental data and applying the 
evidence procedure one must reveal one's initial bias a. 
In this example I assume that there is no prior informa- 
tion available about a specific order of the qubits, so a 
must exhibit the symmetry 

(S,)^ = 5(a) Vz , (Cr/), - c'^'ia) Vz ^ j. (35) 

Moreover, there shall be no reason a priori to prefer any 
particular spatial direction, an indifference which must 
be reflected in the initial bias being isotropic: 



s{a) = , c^'ia) = c(f7) • S 



ah 



(36) 



Yet I do assume that - based on either theoretical con- 
siderations or past experience with similar experimental 
setups - one has reason to believe that the qubits are cor- 
related, and hence c{a) ^ 0. The initial bias is thus spec- 
ified by a single parameter characterizing the expected 
strength of correlations. This parameter is related to the 
expected size of the total angular momentum 



M 



via 



( J^) = 3M(M - 1) • c + 3Af/4. 



(37) 



(38) 



Since the latter expectation value must lie in the range 
between zero and M/2 ■ {M/2 -1-1), the strength of cor- 
relations is bounded by 



l/4(Af- 1) < c < 1/12. 



(39) 



For four qubits, these bounds are ±1/12. 

Given that the initial bias can be characterized com- 
pletely by the expected magnitude of the total angular 
momentum, it is possible to write a in Maxent form 



(40) 



with properly adjusted Lagrange parameter X{a) and 
partition function 



Z(A) := trexp[-A • .P] 



(41) 



For four qubits this partition function can be calculated 
and all resulting thermodynamic equations solved analyt- 
ically. Adding four spins 1/2 the total angular momen- 
tum squared can take the values j{j + 1) with j = 0, 1, 2. 
In the Clebsch-Gordan series these spin-j representations 
occur with respective multiplicities toq = 2, mi = 3 and 
TO2 — 1; and each such representation has dimension 
2j + 1 [4^]. Therefore, 

2 

Z(A) = ^ji^J + 1) exp[-Aj(j + 1)] 

3=0 

2 + 9exp(-2A) -f 5exp(-6A); (42) 
whence follows the expectation value 



(J^> = -d\nZ/dX = 6- 



tion value 

3exp(-2A) + 5exp(-6A) 
+ 9 cxp(-2A) + 5 cxp(-6A) ' 

(43) 

Given the strength of correlations c one can now arrive 
at the associated A in the following three steps: relate c 
to (J^); then solve the thermodynamic equation (j43]) for 
X :~ exp(— 2A), a cubic equation with a unique analytical 
solution; and finally obtain A = — lnx/2. In my example 
I shall assume a prior bias c(ct) = —0.02 with associated 
{.P)a = 2.28, x{a) sa 0.7 and X{a) w 0.18, implying 
Z{X{a)) K. 10. 

Enter experimental data, measured on a sample of size 
N — 10, 000. In order to keep calculations tractable an- 
alytically I make the (admittedly artificial) assumption 
that the r = 48 different sample means confirm the an- 
ticipated symmetries perfectly, and that hence the ex- 
perimental data, too, can be characterized by the sin- 
gle parameter c. But the measured correlation strength 
differs from prior expectation, c 7^ c((t); say, by 25%, 
c = —0.025. Associated with this experimental value is a 
Maxent state //^ of the same canonical form as a, yet with 
different expectation and parameter values (J^) = 2.1, 
X w 0.625 and A « 0.235, which implies z{x) « 8.85. 
This empirical Maxent state deviates from the initial bias 
by the distance 

5(Ai^||cr) ^ {X{a)-X)-{,P)+\nZ{X{a))-\nZ{X) w 0.005. 

(44) 
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One can now deduce N^-m « 5,000 and hence confirm 
that the two criteria for the appHcabihty of the evidence 
procedure {N > Nnwi and r >• 1) are indeed satis- 
fied. From N « 2N^in resuhs an optimal hyperparam- 
eter ao of approximately the same size as the sample, 
ao ~ N. Hence the estimate (|16|) gives approximately 
equal weight to likelihood and prior; it features an in- 
terpolated Lagrange parameter Ae 0.208 which corre- 
sponds to Xe w 0.66, (J^)e w 2.19 and w -0.0226. 

This example illustrates nicely the two distinguishing 
traits of the evidence procedure. First, the estimate is 
not based on experimental data alone; rather, it takes 
into account prior information encoded in the initial bias 
a. In fact, in my example the initial bias and experi- 
mental data carried approximately equal weight, so the 
estimate Ce for the strength of correlations was approxi- 
mately half way between its measured value and the prior 
expectation. Second, even though in my example the set 
of sample means was far from informationally complete, 
the resulting estimate was just as good as if it had been 
based on a set that was informationally complete. In- 
deed, since the experimental data were sufficiently close 
to prior expectations (in particular with regards to the 
anticipated symmetries) the hyperparameter took a value 
of similar magnitude as the sample size; and hence esti- 
mates pertaining to degrees of freedom that had not been 
measured became just as accurate as estimates pertain- 
ing to those that had been measured. 

IV. CONCLUSION 

In this paper I transferred the evidence procedure from 
its historical use in image reconstruction and other clas- 
sical estimation tasks to a novel use in quantum state 
tomography. I verified that this transfer is justified 
whenever samples are large enough (greater than N^i^) 
and the set of measured sample means is sufficiently big 
(r 3> 1), albeit not necessarily informationally complete. 
The power of the evidence procedure lies in its optimal 
use of prior information: Whenever such prior informa- 
tion is available, the procedure promises to improve both 
the accuracy and the efficiency of quantum state estima- 
tion. Indeed, I demonstrated in the four-qubit example 
that the evidence procedure yields a state estimate that 
is modified with respect to the conventional maximum 
likelihood or Maxent estimate, by giving some credence 



still to one's initial bias. The relative weight attributed 
to this initial bias is determined by the degree to which 
the experimental data meet prior expectations overall; in 
my example this degree was high because all anticipated 
symmetries were confirmed perfectly. 

In addition to providing a state estimate the evidence 
procedure also quantifies the associated error bars; and it 
does so not only with respect to measured but also with 
respect to unmeasured degreees of freedom. Whenever 
the data are in good agreement with prior expectations 
overall - as was the case in my example ~ and hence the 
optimal hyperparameter ao is of similar magnitude as the 
sample size TV, the unmeasured degrees of freedom are in 
fact estimated to an accuracy which is just as good as 
if they had been measured, too. In situations where the 
primary goal of an experiment is to confirm or incremen- 
tally amend well-founded prior expectations, therefore, 
quantum state estimation becomes very efficient: Mea- 
surement of a limited, not informationally complete set 
of observables will suffice to estimate the state at a level 
of confidence extending uniformly to all degrees of free- 
dom, even the unmeasured ones. I have not yet consid- 
ered experimental errors introduced by inaccuracies of 
the measurement apparatus. Their treatment should be 
straightforward conceptually, and will be the subject of 
future work. 

Finally, I speculate whether the evidence procedure 
may also have some import on foundational issues in sta- 
tistical mechanics. The thermodynamical description of 
macroscopic systems by means of (generalized) canon- 
ical distributions always represents a theoretical model 
which can never be verified in its entirety — after all, who 
would ever verify experimentally, say, the predicted 42- 
body correlations in a liquid? Nevertheless, we use ther- 
modynamical models with a high degree of confidence. 
The evidence procedure might provide a justification for 
this: As long as the predictions of the thermodynamical 
model are verified for some limited set of observables, one 
has good reason to trust its predictions for all the other 
observables, too. 
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